Affinity-Based Probabilistic Reasoning and Document Clustering on the WWW

نویسندگان

  • Mei-Ling Shyu
  • Shu-Ching Chen
  • Chi-Min Shu
چکیده

The World Wide Web (WWW) has become one of the fastest growing applications on the Internet today. More and more information sources have linked online through WWW, but finding information on the WWW is also a great challenge. For most of the users, the information retrieved is not well organized and the access time is considered high on the WWW currently. Therefore, there is a need to develop a good mechanism to organize and manage the tremendous size and various kinds of information to facilitate the functionality of a search engine for information retrieval on the WWW. In response to such a demand, we propose a Markov Model Mediator (MMM) mechanism which employs the affinity-based data mining techniques to organize and manage the information sources so that the most relevant documents are clustered together to achieve higher recall and precision values for information retrieval on the

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

یک مدل موضوعی احتمالاتی مبتنی بر روابط محلّی واژگان در پنجره‌های هم‌پوشان

A probabilistic topic model assumes that documents are generated through a process involving topics and then tries to reverse this process, given the documents and extract topics. A topic is usually assumed to be a distribution over words. LDA is one of the first and most popular topic models introduced so far. In the document generation process assumed by LDA, each document is a distribution o...

متن کامل

Document Clustering Approaches using Affinity Propagation

Document clustering as an unsupervised approach extensively used to navigate, filter, summarize and manage large collection of document repositories like the World Wide Web (WWW). Recently, Document clustering is the process of segmenting a particular collection of texts into subgroups including content based similar ones. The purpose of document clustering is to meet human interests in informa...

متن کامل

A Joint Semantic Vector Representation Model for Text Clustering and Classification

Text clustering and classification are two main tasks of text mining. Feature selection plays the key role in the quality of the clustering and classification results. Although word-based features such as term frequency-inverse document frequency (TF-IDF) vectors have been widely used in different applications, their shortcoming in capturing semantic concepts of text motivated researches to use...

متن کامل

خوشه‌بندی اسناد مبتنی بر آنتولوژی و رویکرد فازی

Data mining, also known as knowledge discovery in database, is the process to discover unknown knowledge from a large amount of data. Text mining is to apply data mining techniques to extract knowledge from unstructured text. Text clustering is one of important techniques of text mining, which is the unsupervised classification of similar documents into different groups. The most important step...

متن کامل

Load-Frequency Control: a GA based Bayesian Networks Multi-agent System

Bayesian Networks (BN) provides a robust probabilistic method of reasoning under uncertainty. They have been successfully applied in a variety of real-world tasks but they have received little attention in the area of load-frequency control (LFC). In practice, LFC systems use proportional-integral controllers. However since these controllers are designed using a linear model, the nonlinearities...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000